SMOTE for Regression

نویسندگان

  • Luís Torgo
  • Rita P. Ribeiro
  • Bernhard Pfahringer
  • Paula Branco
چکیده

Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Abstract—Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalance...

متن کامل

Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE

Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In th...

متن کامل

Predicting credit card customer churn in banks using data mining

In this paper, we solve the customer credit card churn prediction via data mining. We developed an ensemble system incorporating majority voting and involving Multilayer Perceptron (MLP), Logistic Regression (LR), decision trees (J48), Random Forest (RF), Radial Basis Function (RBF) network and Support Vector Machine (SVM) as the constituents. The dataset was taken from the Business Intelligenc...

متن کامل

Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection

The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such...

متن کامل

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data

BACKGROUND Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013